Opening the Chrysalis: On the Real Repair Performance of MSR Codes
نویسندگان
چکیده
Large distributed storage systems use erasure codes to reliably store data. Compared to replication, erasure codes are capable of reducing storage overhead. However, repairing lost data in an erasure coded system requires reading from many storage devices and transferring over the network large amounts of data. Theoretically, Minimum Storage Regenerating (MSR) codes can significantly reduce this repair burden. Although several explicit MSR code constructions exist, they have not been implemented in real-world distributed storage systems. We close this gap by providing a performance analysis of Butterfly codes, systematic MSR codes with optimal repair I/O. Due to the complexity of modern distributed systems, a straightforward approach does not exist when it comes to implementing MSR codes. Instead, we show that achieving good performance requires to vertically integrate the code with multiple system layers. The encoding approach, the type of inter-node communication, the interaction between different distributed system layers, and even the programming language have a significant impact on the code repair performance. We show that with new distributed system features, and careful implementation, we can achieve the theoretically expected repair performance of MSR codes.
منابع مشابه
Clay Codes: Moulding MDS Codes to Yield an MSR Code
With increase in scale, the number of node failures in a data center increases sharply. To ensure availability of data, failure-tolerance schemes such as ReedSolomon (RS) or more generally, Maximum Distance Separable (MDS) erasure codes are used. However, while MDS codes offer minimum storage overhead for a given amount of failure tolerance, they do not meet other practical needs of today’s dat...
متن کاملA Note on Secure Minimum Storage Regenerating Codes
This short note revisits the problem of designing secure minimum storage regenerating (MSR) codes for distributed storage systems. A secure MSR code ensures that a distributed storage system does not reveal the stored information to a passive eavesdropper. The eavesdropper is assumed to have access to the content stored on l1 number of storage nodes in the system and the data downloaded during ...
متن کاملLocal Codes with Cooperative Repair in Distributed Storage System
Recently, the research on local repair codes is mainly confined to repair the failed nodes within each repair group. But if the extreme cases occur that the entire repair group has failed, the local code stored in the failed group need to be recovered as a whole. In this paper, local codes with cooperative repair, in which the local codes are constructed based on minimum storage regeneration (M...
متن کاملEnabling All-Node-Repair in Minimum Storage Regenerating Codes
We consider the problem of constructing exact-repair minimum storage regenerating (MSR) codes, for which both the systematic nodes and parity nodes can be repaired optimally. Although there exist several recent explicit high-rate MSR code constructions (usually with certain restrictions on the coding parameters), quite a few constructions in the literature only allow the optimal repair of syste...
متن کاملExact Regeneration Codes for Distributed Storage Repair Using Interference Alignment
The high repair cost of (n, k) Maximum Distance Separable (MDS) erasure codes has recently motivated a new class of codes, called Regenerating Codes, that optimally trade off storage cost for repair bandwidth. On one end of this spectrum of Regenerating Codes are Minimum Storage Regenerating (MSR) codes that can match the minimum storage cost of MDS codes while also significantly reducing repai...
متن کامل